Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Acoustic word embedding model based on Bi-LSTM and convolutional-Transformer
Yunyun GAO, Lasheng ZHAO, Qiang ZHANG
Journal of Computer Applications    2024, 44 (1): 123-128.   DOI: 10.11772/j.issn.1001-9081.2023010062
Abstract200)   HTML7)    PDF (1311KB)(91)       Save

In Query-by-Example Spoken Term Detection (QbE-STD), the Acoustic Word Embedding (AWE) speech information extracted by Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN) is limited. To better represent speech content and improve model performance, an acoustic word embedding model based on Bi-directional Long Short-Term Memory (Bi-LSTM) and convolutional-Transformer was proposed. Firstly, Bi-LSTM was utilized for extracting features, modeling speech sequences and improving the model learning ability by superposition. Secondly, to learn local information while capturing global information, CNN and Transformer encoder were connected in parallel to form convolutional-Transformer, which taking full advantages in feature extraction to aggregate more efficient information and improving the discrimination of embeddings. Under the constraint of contrast loss, the Average Precision (AP) of the proposed model reaches 94.36%, which is 1.76% higher than that of the Bi-LSTM model based on attention. The experimental results show that the proposed model can effectively improve model performance and better perform QbE-STD.

Table and Figures | Reference | Related Articles | Metrics